Efficient query-driven biclustering of gene expression data using Probabilistic Relational Models

نویسندگان

  • Tim Van den Bulcke
  • Hui Zhao
  • Kristof Engelen
  • Tom Michoel
  • Bart De Moor
  • Kathleen Marchal
چکیده

Biclustering is an increasingly popular technique to identify gene regulatory modules that are linked to biological processes. We describe a novel method, called ProBic, that was developed within the framework of Probabilistic Relational Models (PRMs). ProBic is an efficient biclustering algorithm that simultaneously identifies a set of potentially overlapping biclusters in a gene expression dataset and which can be used both in a query-driven and a global setting. The model naturally deals with missing values. Robust sets of biclusters are obtained due to the explicit modeling of noise. The maximum likelihood solution is approximated using an Expectation-Maximization strategy. ProBic was applied to various synthetic gene expression datasets and the results for synthetic data confirmed that ProBic can successfully identify biclusters under various noise levels, overlap and missing values in both the query-driven and global setting. Additional expert knowledge can be introduced through a number of prior distribution parameters. Default settings were shown to be applicable for a wide range of different datasets. Our results on synthetic data show that PRMs can be used to identify overlapping biclusters in an efficient and robust manner. keywords: biclustering, probabilistic relational model, gene expression, regulatory module, expectation-maximization. ∗corresponding author

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Biological Significance of Biclusters of SIMBIC and SIMBIC+ Biclustering Models

Query driven Biclustering Model refers to the problem of extracting biclusters based on a query gene or query condition. The extracted biclusters consist of a set of genes and a subset of conditions that are similar to the query gene or query condition and it includes the query input also. Two approaches applied for biclustering problems are topdown and bottom-up, based on how they tackle the p...

متن کامل

BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses require sophisticated machine-learning too...

متن کامل

Graphical models for biclustering and information retrieval in gene expression data

Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author José Caldas Name of the doctoral dissertation Graphical Models for Biclustering and Information Retrieval in Gene Expression Data Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 33/2012 Field of research Bioinformatics Manuscript ...

متن کامل

Efficient Mining Differential Co-Expression Constant Row Bicluster in Real-Valued Gene Expression Datasets

Biclustering aims to mine a number of co-expressed genes under a set of experimental conditions in gene expression dataset. Recently, differential co-expression biclustering approach has been used to identify class-specific biclusters between two gene expression datasets. However, it cannot handle differential co-expression constant row biclusters efficiently in real-valued datasets. In this pa...

متن کامل

A Trust Based Probabilistic Method for Efficient Correctness Verification in Database Outsourcing

Correctness verification of query results is a significant challenge in database outsourcing. Most of the proposed approaches impose high overhead, which makes them impractical in real scenarios. Probabilistic approaches are proposed in order to reduce the computation overhead pertaining to the verification process. In this paper, we use the notion of trust as the basis of our probabilistic app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008